Background and Context

The Thera bank recently saw a steep decline in the number of users of their credit card, credit cards are a good source of income for banks because of different kinds of fees charged by the banks like annual fees, balance transfer fees, and cash advance fees, late payment fees, foreign transaction fees, and others. Some fees are charged on every user irrespective of usage, while others are charged under specified circumstances. Customers’ leaving credit cards services would lead bank to loss, so the bank wants to analyze the data of customers’ and identify the customers who will leave their credit card services and reason for same – so that bank could improve upon those areas You as a Data scientist at Thera bank need to come up with a classification model that will help bank improve their services so that customers do not renounce their credit cards

Objective

  1. Explore and visualize the dataset.
  2. Build a classification model to predict if the customer is going to churn or not
  3. Optimize the model using appropriate techniques
  4. Generate a set of insights and recommendations that will help the bank

Data Dictionary:

Review Sample of data

Check Shape of Data, Check data types and number of non-null values for each column.

Insights

Summary of Data

Insights

Insights

Insights

Change the object type variables to Categorical variables for space

Insights

Review counts of the category variables

Insights

Drop client variable since it doesn't provide value

Perform an Exploratory Data Analysis on the data

Univariate analysis

Insights

Insights

Insights

Insights

Insights

Insights

Insights

Insights

Insight

-Total_Amt_Chng_Q4_Q1 is slightly right skewed

Insights

Insights

Insights

Insights

Review Categorical Variables

Function to create Bar Charts of percentages

Insights

Insights

Insights

Insights

Insights

Insights

Bivariate analysis

Review pairplot for continuous variables

Insights

Review Correlation Heat Map

Insights

Reviews crosstab of categorical varibles with proportion of Attrition_Flag

Insights

Insights

Insights

Insights

Insights

Insights

Review the continous variables using distbrution plots

Insights

Insights

Insights

Insights

Insights

Insights

Insights

Insights

Insights

Insights

Insights

Insights

Insights

Create profile for Attrition status

Insights on profiles of Customers

Data Pre-processing

Prepare the data for analysis - Missing value Treatment, Outlier Detection(treat, if needed- why or why not ), Feature Engineering, Prepare data for modeling

Create Maps to the bins for different variables

Remap Variables

One hot encoding for the remaining category variables

Check to see all the data types are numeric before model building

Insights

Drop unneeded variables

Insights

Split the Data

Functions to create Confusion Matrix and Scoring Metrics

Create upsample and down sample data sets for Logistic Regression

SMOTE to upsample smaller class

Down Sampling the larger class

Create Logistic Regression Model on regular data set

Scoring our Logistic Regression Model

Insights

Create Logistic Regresson on Upsampled data set

Insights

Create Logistic Regresson on Downsampled data set

Insights

Build Decision Tree Model

Insights

Insights

Insights

Build Random Forest Model

Insights

Insights

Insights

Build Bagging Model

Insights

Insights

Insights

Build AdaBoost Model

Insights

Insights

Insights

Build GradientBoosting Model

Insights

Insights

Insights

Build XGBoost model

Insights

Insights

Insights

Comparing all models

Summary of Comparisons

Business Recommendation